Automatic Audio Chord Recognition With MIDI-Trained Deep Feature and BLSTM-CRF Sequence Decoding Model

#survey #Chord_Recognition #2018 #TASLP ( #ICASSP )

ShuKumata.icon

Author: Yiming Wu, Wei Li

Research institute:

The problem the authors try to solve:

Link to This Paper: https://ieeexplore.ieee.org/document/8523662

1枚まとめ

0. とりあえず一言

http://xiao-ming.digick.jp/mir/post-621/

アブスト

With the advances of machine learning technologies, data-driven feature extraction and sequence modeling approaches are being widely explored for automatic chord recognition tasks. Currently, there is a bottleneck in the amount of enough annotated data for training robust acoustic models, as hand-annotating timesynchronized chord labels requires professional musical skills and considerable labor. To cope with this limitation, in this paper, we propose a convolutional neural network (CNN) based deep feature extractor, which is trained on a large set of time, synchronized musical instrument digital interface audio data pairs and can robustly estimate pitch class activations of real-world music audio recordings. The CNN feature extractor plus a bidirectional long short-term memory conditional random field decoding model forms the proposed hybrid system for automatic chord recognition. Experiments show that the proposed model is compatible for both regular major/minor triad chord classification and larger vocabulary chord recognition, and outperforms other state-of-the-art chord recognition systems.

機械学習技術の進歩によって、data-drivenな特徴量抽出やsequenceのモデリング手法がAutomatic Chord Recognitionのタスクでも広く探索されるようになっている。手動で時間と同期したコードラベルのアノテーションにプロの音楽スキルと莫大な労働力が必要なために、最近では、ロバストな音響モデルを学習するためのアノテーションされた十分な量のデータがボトルネックになっている。この制限に対処するため、本論文では、我々はCNNベースのdeepな特徴量抽出器を提案する。その抽出器は、time, synchronized musical instrument digital interface audio data pairsで学習され、実世界の音楽の記録されたAudioのpitch classの活性化をロバストに推定することができる。この特徴量抽出器とBidirectional-LSTM-CRF decodingモデルが提案するACRのhybridなシステムを形成している。実験では、提案したモデルがregular major/minor triad chord cllasificationやLarger vocabulary chord recognitionに適合していることが示され、そして、他のSOTAなchord recognitionのシステムに優れる結果を示した。

1. どんなもの？問題意識は？

2. 先行研究と比べてどこがすごい？

3. 技術や手法のキモはどこ？

4. どうやって有効だと検証した？

Dataset: Lakh MIDI, Isophonics, RWC-Popular

5. 議論はある？

6. 次に読むべき論文は？

7. メモ

8.コメント

リンク

http://xiao-ming.digick.jp/mir/post-621/

https://github.com/Xiao-Ming/ChordRecognitionMIDITrainedExtractor